Search CORE

68 research outputs found

Subgroup Discovery for Defect Prediction

Author: D. Gamberger
N. Lavrač
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Expert-Guided Subgroup Discovery: Methodology and Application

Author: Gamberger D.
Lavrac N.
Publication venue: 'AI Access Foundation'
Publication date: 22/06/2011
Field of study

This paper presents an approach to expert-guided subgroup discovery. The main step of the subgroup discovery process, the induction of subgroup descriptions, is performed by a heuristic beam search algorithm, using a novel parametrized definition of rule quality which is analyzed in detail. The other important steps of the proposed subgroup discovery process are the detection of statistically significant properties of selected subgroups and subgroup visualization: statistically significant properties are used to enrich the descriptions of induced subgroups, while the visualization shows subgroup properties in the form of distributions of the numbers of examples in the subgroups. The approach is illustrated by the results obtained for a medical problem of early detection of patient risk groups

arXiv.org e-Print Archive

Crossref

Semantic Subgroup Discovery and Cross-Context Linking for Microarray Data Analysis

Author: A. Koestler
D. Gamberger
D. Gamberger
D.R. Swanson
F. Železny
I. Petrič
I. Trajkovski
I. Trajkovski
L. Eronen
M. Weeber
P. Sevon
P. Subramanian
S.Y. Kim
V. Podpečan
W. Dubitzky
Publication venue: Springer Berlin Heidelberg
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

Multilayer clustering: Biomarker driven segmentation of Alzheimer's disease patient population

Author: C. Hinrichs
D. Gamberger
J.B. Langbaum
L. Breiman
M.C. Evans
P.M. Doraiswarny
T. Galili
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2015
Field of study

Identification of biomarkers for the Alzheimer's disease is a challenge and a very difficult task both for medical research and data analysis. In this work we present results obtained by application of a novel clustering tool. The goal is to identify subpopulations of the Alzheimer's disease (AD) patients that are homogeneous in respect of available clinical and biological descriptors. The result presents a segmentation of the Alzheimer's disease patient population and it may be expected that within each subpopulation separately it will be easier to identify connections between clinical and biological descriptors. Through the evaluation of the obtained clusters with AD subpopulations it has been noticed that for two of them relevant biological measurements (whole brain volume and intracerebral volume) change in opposite directions. If this observation is actually true it would mean that the diagnosed severe dementia problems are results of different physiological processes. The observation may have substantial consequences for medical research and clinical trial design. The used clustering methodology may be interesting also for other medical and biological domains

Crossref

Full-text Institutional Repository of the Ruđer Bošković Institute

The Effect of Class Noise on Continuous Test Case Selection: A Controlled Experiment on Industrial Data

Author: B Frénay
B Sluban
D Gamberger
DF Nettleton
F Pedregosa
GH John
H Hata
J Abellán
JA Sáez
Q Zhao
S Kim
X Zhu
Publication venue
Publication date: 01/01/2020
Field of study

Continuous integration and testing produce a large amount of data about defects in code revisions, which can be utilized for training a predictive learner to effectively select a subset of test suites. One challenge in using predictive learners lies in the noise that comes in the training data, which often leads to a decrease in classification performances. This study examines the impact of one type of noise, called class noise, on a learner’s ability for selecting test cases. Understanding the impact of class noise on the performance of a learner for test case selection would assist testers decide on the appropriateness of different noise handling strategies. For this purpose, we design and implement a controlled experiment using an industrial data-set to measure the impact of class noise at six different levels on the predictive performance of a learner. We measure the learning performance using the Precision, Recall, F-score, and Mathew Correlation Coefficient (MCC) metrics. The results show a statistically significant relationship between class noise and the learners performance for test case selection. Particularly, a significant difference between the three performance measures (Precision, F-score, and MCC)under all the six noise levels and at 0% level was found, whereas a similar relationship between recall and class noise was found at a level above30%. We conclude that higher class noise ratios lead to missing out more tests in the predicted subset of test suite and increases the rate of false alarms when the class noise ratio exceeds 30

Crossref

Chalmers Research

Preceding rule induction with instance reduction methods

Author: A. Lukasz
D. Gamberger
D.L. Wilson
D.R. Wilsson
D.R. Wilsson
D.T. Pham
D.W. Aha
G.L. Ritter
G.W. Gates
I. Tomek
J. Fürnkranz
K. Grudzinski
K. Grudziński
K. Hindi El
K.P. Zhao
O. Othman
P. Clark
P. Clark
P.E. Hart
R. Kohavi
R. Schapire
S. Weiss
T.M. Mitchell
W. Cohen
W. Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

A new prepruning technique for rule induction is presented which applies instance reduction before rule induction. An empirical evaluation records the predictive accuracy and size of rule-sets generated from 24 datasets from the UCI Machine Learning Repository. Three instance reduction algorithms (Edited Nearest Neighbour, AllKnn and DROP5) are compared. Each one is used to reduce the size of the training set, prior to inducing a set of rules using Clark and Boswell's modification of CN2. A hybrid instance reduction algorithm (comprised of AllKnn and DROP5) is also tested. For most of the datasets, pruning the training set using ENN, AllKnn or the hybrid significantly reduces the number of rules generated by CN2, without adversely affecting the predictive performance. The hybrid achieves the highest average predictive accuracy

CiteSeerX

University of Salford Institutional Repository

Crossref

Using ILP to Identify Pathway Activation Patterns in Systems Biology

Author: A Subramanian
AL Tarca
C Perlich
D Croft
D Gamberger
JJ Tyson
K Rhrissorrakrai
K Whelan
L Danon
L Dehaspe
L Raedt De
M Holec
MN McCall
MVM França
N Lavrač
N Lavrač
O Kuželka
P Ristoski
PA Flach
R Edgar
R-S Wang
S Draghici
W Kim
W Rongrong
X Robin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We show a logical aggregation method that, combined with propositionalization methods, can construct novel structured biological features from gene expression data. We do this to gain understanding of pathway mechanisms, for instance, those associated with a particular disease. We illustrate this method on the task of distinguishing between two types of lung cancer; Squamous Cell Carcinoma (SCC) and Adenocarcinoma (AC). We identify pathway activation patterns in pathways previously implicated in the development of cancers. Our method identified a model with comparable predictive performance to the winning algorithm of a recent challenge, while providing biologically relevant explanations that may be useful to a biologist

Crossref

PubMed Central

King's Research Portal

Explore Bristol Research

Improved comprehensibility and reliability of explanations via restricted halfspace discretization

Author: A. An
A. An
D. Gamberger
E. Boros
E. Boros
E. Triantaphyllou
G. Felici
G. Felici
G.A. Miller
G.S. Halford
G.S. Halford
H. Liu
I. Guyon
J. Quinlan
L.A. Kurgan
M. Atzmueller
M. Boullé
M. Boullé
M.R. Chmielewski
N. Cowan
N. Lavrač
P. Perner
S. Bartnikowski
S. Bay
U. Fayyad
V. Vapnik
W. Klösgen
W.-H. Au
Y. Yang
Publication venue
Publication date: 01/01/2009
Field of study

Abstract. A number of two-class classification methods first discretize each attribute of two given training sets and then construct a propositional DNF formula that evaluates to True for one of the two discretized training sets and to False for the other one. The formula is not just a classification tool but constitutes a useful explanation for the differences between the two underlying populations if it can be comprehended by humans and is reliable. This paper shows that comprehensibility as well as reliability of the formulas can sometimes be improved using a discretization scheme where linear combinations of a small number of attributes are discretized

CiteSeerX

Crossref

Main findings and advances in biomedical engineering and bioinformatics from IWBBIO 2015

Author: A Gimenez
B Macdonald
D Gamberger
F Larriba
Franscisco M. Ortuño
G Molina-Recio
Ignacio Rojas
J Ortega
LM Soria-Morillo
M Abbasi
O Banos
Olga Valenzuela
P Cisar
Peter Glösekötter
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Mining Exceptional Social Behaviour

Author: AM Jorge
B Škrlj
C Rebelo de Sá
C Romero
D Gamberger
D Leman
DS Messinger
F Berlanga
F Herrera
H Grosskreutz
HW Lauw
I Altman
JA Bondy
JF Roddick
JM Kleinberg
L Cabrera-Quiros
M Atzmueller
M Atzmueller
M Atzmueller
M Atzmueller
M Atzmueller
M McPherson
M. E. J. Newman
N Delener
N Owen
S Wasserman
S Wrobel
W Klösgen
W Klösgen
Publication venue: EPIA 2019 proceedings, Part II, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2019
Field of study

Essentially, our lives are made of social interactions. These can be recorded through personal gadgets as well as sensors adequately attached to people for research purposes. In particular, such sensors may record real time location of people. This location data can then be used to infer interactions, which may be translated into behavioural patterns. In this paper, we focus on the automatic discovery of exceptional social behaviour from spatio-temporal data. For that, we propose a method for Exceptional Behaviour Discovery (EBD). The proposed method combines Subgroup Discovery and Network Science techniques for finding social behaviour that deviates from the norm. In particular, it transforms movement and demographic data into attributed social interaction networks, and returns descriptive subgroups. We applied the proposed method on two real datasets containing location data from children playing in the school playground. Our results indicate that this is a valid approach which is able to obtain meaningful knowledge from the data.This work has been partially supported by the German Research Foundation (DFG) project “MODUS” (under grant AT 88/4-1). Furthermore, the research leading to these results has received funding (JG) from ESRC grant ES/N006577/1. This work was financed by the project Kids First, project number 68639

Crossref

Apollo (Cambridge)

University of Twente Research Information